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Abstract 

Universal  hash  functions  that  exhibit  c  log  n- wise  independence  are  shown  to  give  a  perfor¬ 
mance  in  double  hashing,  uniform  hashing  and  virtually  any  reasonable  generalization  of  double 
hashing  that  has  an  expected  probe  count  of  j^y  +  0(|-)  for  the  insertion  of  the  cm-th  item  into 
a  table  of  size  n,  for  any  fixed  a  <  1.  This  performance  is  optimal.  These  results  are  derived 
from  a  novel  formulation  that  overestimates  the  expected  probe  count  by  underestimating  the 
presence  of  local  items  already  inserted  into  the  hash  table,  and  from  a  very  sharp  analysis  of 
the  underlying  stochastic  structures  formed  by  colliding  items. 

Analogous  bounds  are  attained  for  the  expected  r-th  moment  of  the  probe  count,  for  any 
fixed  r,  and  linear  probing  is  also  shown  to  achieve  a  performance  with  universal  hash  functions 
that  is  equivalent  to  the  fully  random  case. 

Categories  and  Subject  Descriptors:  E.l  [Data]:  Data  Structures — arrays ;  tables;  E.2  [Data]:  Data  Storage 
Representations — hash-table  representations;  F2.2  [Analysis  of  Algorithms  and  Problem  Complexity]: 
Nonnumerical  Algorithms  and  Problems — sorting  and  searching. 

General  terms:  Algorithms,  Theory. 

Additional  Key  Words  and  Phrases:  Clustering,  Double  hashing,  hashing,  limited  independence,  linear  probing, 
open  addressing,  random  probing,  uniform  hashing,  universal  hash  functions. 
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1,  Background 

Hashing  covers  a  variety  of  schemes  to  maintain  an  associative  lookup  table  L[0..n  -  1]  for  a  set 
of  keys  that  belong  to  some  large  universe  U.  In  closed  hashing,  access  to  a  key  x  is  achieved 
by  following  a  sequence  p{  x,l),  p(x,  2),  p(x,3),  ...  of  computed  probe  indices  that  belong  to 
the  range  [0..?t  -  1],  In  particular,  a  key  x  is  inserted  into  the  hash  table  L  by  storing  the  key 
in  the  first  vacant  table  slot  among  L\p(x,j)\,  for  j  =  1,2,--.  Retrieval  is  achieved  by  probing 
according  to  the  same  sequence  of  computed  indices,  until  either  x  is  found  or  a  vacant  table 
location  is  encountered. 

Definition  1. 

•  Let  D  —  ( x i ,  x2, . . . ,  xan )  be  a  sequence  of  keys  from  the  universe  of  integers  U  =  [0..m  -  1], 

•  The  members  of  D  will  be  sequentially  hashed  into  table  L\§..n  -  1], 

•  We  say  that  Xj.  is  hashed  at  time  k. 

•  We  say  that  Xj.  is  embedded  at  location  £  if  the  hashing  of  D  stores  Xj.  in  L\£\. 

•  Suppose  T  is  an  increasing  subsequence  of  indices  T  c  (1, 2, . . . ,  an).  Let  Dj*  denote  the 
subsequence  of  keys  (xt)tep. 

•  For  a  sequence  S  and  scalar  t,  Sf  denotes  the  t-th  item  in  5,  so  that  Dj  —  Xf. 

•  Let  the  number  of  probes  needed  to  insert  xan  be  denoted  by  probean. 

This  paper  analyzes  E \probean\,  the  expected  number  of  probes  needed  to  insert  xan. 

Uniform  hashing  is  an  idealized  model  where  the  probe  sequence  p(x,  *),  for  each  key  x  6  U , 
is  assumed  to  be  a  fully  independent  random  function  (or  permutation). 

Traditional  double  hashing  defines  p{x,j)  —  f(x)  -  ( j  -  l)d(x)  mod  n,  where  the  table  size 
n  is  prime,  /(x)  is  assumed  to  return  an  arbitrarily  selected  integer  in  [0..??.  -  1],  and  <f(x)  is  an 
arbitrarily  selected  value  in  [1,..??  —  1],  The  random  functions  {(d(x), /(x))}xe p  are  assumed  to 
be  fully  independent  and  uniformly  distributed  over  their  respective  ranges.  The  probe  scheme 
originates  in  the  1968  Ph.D.  thesis  of  Guy  de  Balbine  [11], 
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Linear  probing  uses  one  independent  random  function,  and  the  probe  sequence  is  defined 
by  P(XJ)  =  (f(x)  +  1  -  j)  mod  n. 

Tertiary  clustering  is  an  idealized  model  where  a  random  function  h  is  first  used  to  map 
the  data  D  into  the  interval  [0,?r2],  and  the  image  multiset  is  then  hashed  via  idealized  uniform 
hashing.  The  point  of  this  formulation  is  to  model  circumstances  where  a  pair  of  distinct  hash 
keys  might  receive  the  exact  same  random  probe  sequence  with  probability  T7,  as  opposed  to 
the  even  less  feasible  model  of  pure  uniform  hashing,  where  the  probability  is  ^y. 

Prior  work  on  double  hashing  includes  the  results  of  Guibas  and  Szemeredi  [9],  Lueker  and 
Molodowitch  [12],  and  Schmidt  and  Siegel  [16].  Lueker  and  Molodowitch  presented  a  very  elegant 
proof  to  show  that  for  random  functions  f  and  d  and  any  fixed  load  factor  a  <  1,  the  expected 
number  of  probes  to  insert  the  (an  +  l)-st  item  is  y-W-  +  0(  ^y=  — ),  which  is  asymptotically 
equivalent  to  uniform  hashing,  and  hence,  by  the  result  of  Yao  [21],  optimal.  Previously,  Guibas 
and  Szemeredi,  had  established  a  comparable  bound  for  loads  a  < 

Schmidt  and  Siegel  showed  that  if  the  probe  functions  /  and  g  are  selected  from  a  set  of 
universal  hash  functions  that  exhibit  (clogn)-wise  independence,  then  double  hashing  will  have 
an  expected  insertion  performance  that  is  at  most  +  e,  for  any  fixed  a  <  1,  e  >  0  and  a  large 
enough  constant  c.  Consequently,  nearly  optimal  performance  can  be  achieved  with  functions  / 
and  g  that  are  computed  by  a  program  that  uses  a  small  number  of  random  seeds.  The  results 
are  established  for  a  class  of  probe  schemes  that  is  considerably  larger  than  the  arithmetic 
progressions  of  double  hashing.  Further  information  on  previous  work  and  the  implications  and 
formalizations  of  limited  randomness  can  be  found  in  [16]. 

We  now  build  upon  the  results  and  the  frame  work  of  [16]  to  show  that  double  hashing  has 
an  expected  probe  performance  of  yWy  +  0(L),  for  any  fixed  a  <  1.  This  error  bound  is  new  even 
in  the  case  of  full  randomness.  More  importantly,  it  is  shown  to  hold  even  when  (clogu)-wise 
independent  hash  functions  are  used,  where  c  is  a  constant  that  depends  on  a.  In  addition, 
comparable  bounds  are  attained  for  the  expected  moments  of  the  probe  counts.  These  results 
also  hold  for  the  generalized  double  hashing  schemes  of  [16]. 
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The  objectives  underlying  our  generalized  definition  of  double  hashing  are  to  include  double 
hashing  and  uniform  hashing  within  the  same  model,  and  to  characterize  the  basic  probe  strate¬ 
gies  that  enable  these  schemes  to  achieve  such  good  performance,  both  for  fully  independent 
hash  functions  and  for  those  exhibiting  limited  independence. 

Definition  2:  The  models  UH,  DH ,  and  DH \p. 

•  In  UH,  the  probe  sequence  p(x,*)  is  an  independent  family  of  random  variables  that  are 
uniformly  distributed  over  [0,  n  — 1].  Any  collection  of  sequences  p(aq ,  *),p(a2,  ,p{xn,  *) 
are  mutually  independent,  for  distinct  xt. 

•  DH  relaxes  the  requirement  that  each  individual  probe  sequence  be  fully  random. 

1.  Each  individual  probe  sequence  p(x,*)  exhibits  approximate  pairwise  independence: 

Vx  V*,  J  i  A  j,  Vr,  5e[0,R-l]r^:  Prob{p(x,  i )  =  r,  p(x,j)  =  3}  =  {n_o{1)p- t 

2.  Furthermore,  the  random  sequences  {_p(as;} *)}:e6_d  are  mutually  independent.  This  con¬ 
dition  need  only  hold  for  a  subset  of  hash  functions  F  c  F,  where  F  depends  on  D, 
and  }Jj  >  1  -  ^r- 

3.  In  addition,  we  have  the  following  robustness  requirements. 

i)  Extremely  long  probe  sequences  are  quite  rare:  For  a  fixed  c0  that  depends  on  a, 

Va  :  ^2  Prob{\  u|=1  {p{x,  *)}|  <  an  +  1}  = 

t>c0n 

ii)  Probe  sequences  are  unlikely  to  reprobe  locations  too  frequently. 

Vx  Vj  <  k  <  h ,  r  e  [0,  n  -  1]  :  Prob{p(x,j)  —  p(x,  k),  p(x,  h )  =  r}  =  pp- 
Va  V/f  <  i.j  <  k,  (. h ,  i )  F  (. j ,  k)  :  Prob{p(x,  h)  =  p(x,  i ),  p(x,j)  =  p(x,  k)}  = 


•  In  the  statistical  probe  behavior  of  an  individual  probe  sequence  is  subject  to  the 

tWe  use  the  Big-Oh  notation  in  the  following  standard  way:  /  =  g  +  0{h)  means  that  |/  —  g\  =  0(|/i|). 
Consequently,  there  is  no  distinction  between  f  +  0(g)  and  f  —  0(g).  Nevertheless,  we  shall,  upon  occasion, 
use  minus  signs  to  suggest  that  the  worst  case  error  is  negative.  Also,  it  is  quite  reasonable  to  write,  say, 
=  1  _  0(h) ,  for  h  =  o(l). 
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same  requirements  as  in  DH  for  the  first  if'  probes,  the  global  coverage  requirement  must 
still  hold,  and  the  joint  distribution  of  initial  probe  sequences,  for  collections  of  ip  or  fewer 
probes,  is  required  to  be  statistically  independent,  for  distinct  items.  More  precisely,  we 
have  the  following. 

1.  Vx  \/i,j  <ip  i  j=j,  Vr,s  e  [0,72-  1]  r  A  s  :  Prob{p{x:i)  =  r,  p(x,j )  =  s}  =  (K_(j(1))2- 

2.  Knowing  a  limited  number  of  the  probe  values  for  a  small  set  of  keys  gives  no  informa¬ 

tion  about  the  first  few  probes  for  another  key.  Formally,  let  Z  be  a  set  of  keys  (  E  V 
with  associated  probe  count  bounds  where  —  V5-  Let  1,  2;,  ■  ■  ■ ,  J(: }(€z 

be  a  multiset  of  arbitrary  probe  locations.  Then 

Pr°b{  /\  /\  p((,j)  =  KC|i}  =  n  Prob{  /\  p(C,j)  =  KXj}. 

CeZj<jc  C  eZ  j<jg 

This  condition  need  only  hold  for  a  subset  of  hash  functions  F  c  F,  where  F  depends 
on  D,  and  {4r|  >  1  — 

3.  For  some  hxed  c0  that  depends  on  a,  Mx  :  J2t>c0n  T>?’o6{|ul_1  {p(x,  z)}|  <  cm-\- 1}  = 

4.  Vx  Vj  <  k  <  h  <ip,re  [0,  n  -  1]  :  Prob{p(x,j)  =  p(x.  k).  p(x,  h)  =  r}  = 

5.  Mx  V/i.  <  i  <  ip,j  <  k  <  ip1  (h.  i)  ^  (j.  k ): 

Prob{p(x,h)  =p(x,i ),  p(x,j)  =p(x,k )}  = 

According  to  these  definitions,  UH  c  DH  c  DH The  exclusionary  r  ^  s  in  1  of  the  DH  and 
DH.X  definitions  is  given  to  ensure  that  standard  double  hashing  lies  within  DH . 

In  these  formal  models,  a  family  of  hash  functions  H  comprises  a  finite  set  of  functions. 
Given  the  data  sequence  D,  a  specific  hash  function  is  selected  by  choosing  a  member  from  S  at 
random  according  to  the  uniform  distribution.  The  statistical  properties  defined  by  UH ,  DH , 
and  DH.X  are  with  respect  to  II .  although  3i  for  DH  and  3  in  DH ^  are  also  algorithmically 
dependent. 

The  key  notions  used  in  the  analysis  of  these  probe  schemes  were  1)  dependency  sets,  and 
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2)  multiplicative  vacancy  overestimators  ([16]). 

Definition  3. 

•  The  dependency  set  of  a  hash  key  Xj.  is  recursively  defined  to  comprise  Xj.  and  the  dependen¬ 
cy  set  members  of  the  previously  inserted  keys  that  reside  in  the  locations  unsuccessfully 
probed  on  behalf  of  xj,  during  its  insertion.  We  denote  the  dependency  set  of  x  e  D  (with 
respect  to  D )  by  dep(x,D).  The  dependency  DAG  for  x,  G(x,D),  has  the  root  x;  the  graph 
also  contains  directed  edges  from  x  to  the  items  that  reside  in  the  occupied  locations  visited 
during  the  insertion  of  x.  and  the  recursively  defined  dependency  DAGs  for  each  of  these 
items.  The  DAG  is  a  tree  if  each  vertex  other  than  the  root  has  indegree  one.  A  subsequence 
S  c  D  is  a  local  dependency  set  if  S  hashes  by  itself  into  a  dependency  set.  Thus  xj»  will 
be  the  root  of  a  unique  dependency  DAG  when  D  is  hashed,  but  will  root  many  different 
different  local  dependency  DAGs,  in  general.  A  partial  dependency  DAG  Gr(x,D)  is  the 
subgraph  of  G(x ,  D)  restricted  to  x  and  paths  from  x  that  begin  with  the  one  of  the  first  r 
probes  for  x. 

•  Given  a  set  of  hash  functions  (or  set  of  statistical  properties  satisfied  by  a  set  of  universal 

hash  functions),  a  vacancy  estimator  is  a  function  q(t)  that  overestimates  the  probability 
that  a  slot  location  /  is  vacant  at  time  t.  The  estimate  should  hold  regardless  of  the  value  of 
l  and  the  data  in  D.  The  estimator  is  multiplicative  if,  for  any  sequence  of  slots  /i,  /2, . . . ,  4, 
and  corresponding  times  G,f2,  •  ■  ■  Gki  the  joint  probability  that  slot  is  vacant  at  time  tt. 
for  i  =  1, 2, .  - , ,  k  is  at  most  (1  +  0{k2 /??.))  </(/■,).  This  multiplicativity  need  only  hold  for 

k  =  O(log  n). 

We  will  count  the  expected  number  of  partial  dependency  DAGs  rooted  at  xan,  which  means 
that  root  xan  may  not  yet  have  found  a  vacant  table  slot  for  insertion.  Thus  the  next  probe, 
on  behalf  of  xan ,  will  add  another  branch  to  the  DAG,  if  the  new  slot  turns  out  to  be  occupied. 
Let,  xan  have  r  children  in  the  DAG  G(xan,  D).  Then  it  will  have  encountered  r  +  1  DAGs. 
(The  first  will  have  zero  children  since  we  do  not  require  the  root  to  be  inserted  when  counting 
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these  structures.)  Thus  the  number  of  such  DAGs  actually  encountered  by  xan  is  precisely  the 
number  of  probes  needed  to  insert  the  key. 

A  consequence  of  these  definitions  is  that  the  expected  number  of  probes  to  insert  the  an- th 
key  xan  is  precisely  the  expected  number  of  local  partial  dependency  DAGs  rooted  by  xan, 
where  the  probability  that  a  subsequence  hashes  to  form  a  local  partial  dependency  DAG  is  the 
probability  that  the  colliding  probes  among  the  members  of  the  subsequence  occur  as  specified 
by  the  DAG  structure,  and  the  non-root  nodes  find  their  locally  determined  embedding  locations 
vacant  at  the  times  of  their  insertions  within  D. 

The  difficult  part  of  the  calculation  is  to  provide  an  adequate  overestimate  of  the  joint 
vacancy  probability  for  the  members  of  a  local  dependency  set.  A  good  vacancy  estimate 
is  essential  for  reducing  the  accounting  of  spurious  dependency  sets  that  locally  hash  into  a 
dependency  DAG  rooted  by  xan- 

Schmidt  and  Siegel  use  this  counting  approach  to  attain  Theorem  1  in  [16],  which  states 
that  the  expected  number  of  probes  to  insert  xan  is  at  most 

E \probeQn\  <  ,  1  =  +  0(1/71),  (1) 

\J  1  -  2 Q(an) 

where  Q(k )  =  T  and  ?(*)  is  a  multiplicative  vacancy  estimator.  This  bound,  as  a 

function  of  Q,  is  established  for  any  hashing  procedure  that  satisfies  the  statistical  properties  of 
DTT,0,  for  i/)  >  clogra,  for  some  fixed  constant  c  that  is  independent  of  n. 

Numerical  values  for  the  expected  number  of  probes  are  attained  by  quantifying  the  following 
vacancy  criterion  which,  it  turns  out,  is  indeed  a  multiplicative  vacancy  overestimator,  even  in 

DHy. 

Definition  4:  The  vacancy  criterion  MO)(T,I)  and  its  probability  q(0(T,I). 

•  Let  Af(fe)(T,  /  ),  be  the  vacancy  criterion  that  says,  for  each  j  in  1,2,  ...,|/|,  there  is  no 
tuple  S  c  (aq, . . . ,  xTj-l )  ~  Oj  of  size  |,S'|  <  h  that  hashes  into  a  dependency  tree  G  rooted 
at  location  Iy 

•  Let  the  exact  probability  of  /)  be  denoted  by  /). 
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Informally,  location  I}  is  deemed  to  be  occupied  by  time  T.j  only  if  there  is  some  tuple  of  h  or 
fewer  items,  among  . . .  ,  that  hashes  by  itself  into  a  tree  that  is  rooted  at  Iy  We  choose 
to  ignore  non-tree  DAGs  in  this  definition. 

Calculations  in  [16]  show  that  with  O(logn)-wise  independence,  these  multiplicative  vacancy 
estimators  q^1)  are,  for  sufficiently  large,  fixed  h.  within  any  fixed  e  of  the  true  estimate.  Now, 
the  true  vacancy  estimate  q(i)  =  1  —i/n  gives  the  bound  j-W  +  0(\/n)  for  the  expected  insertion 
cost  in  (1 ),  which  is  the  actual  bound  for  uniform  hashing  with  fully  independent  random  probes. 

Unfortunately,  it  is  necessary  to  have  an  asymptotically  exact  overestimator  if  the  expected 
probe  count  is  to  have  an  error  of  O(^).  Otherwise  our  expectation  will  be  augmented  by 
spurious  probe  statistics  from  DAGs  other  than  the  actual  dependency  graph  for  xa  G  D.  since 
there  are  many  items  that  might  initially  hash  to  a  given  location,  though  only  the  first  will 
actually  reside  there.  Yet,  the  errors  resulting  from  very  accurate  vacancy  formulations  appear 
to  be  rather  difficult  to  bound  satisfactorily. 

1.1  Overview 

We  now  use  the  constructions  from  [16]  plus  inclusion-exclusion  to  establish  a  sharp  approximate 
isomorphism  between  double  hashing  and  uniform  hashing.  The  inclusion-exclusion  will  elim¬ 
inate  the  overcount  of  (spurious)  probe  collisions  between  xan  and  previous  keys  that  appear 
to  be  inserted  at  locations  that  satisfy  our  weak  vacancy  criterion,  but  are  actually  inserted 
elsewhere,  because  their  apparently  vacant  insertion  slots  will  actually  be  already  occupied  by 
the  time  they  are  hashed  from  D  into  L. 

The  isomorphism  shows  that  double  hashing,  for  example,  admits  a  calculation  for  the 
expected  probe  performance  that  is  the  same  as  that  for  uniform  hashing,  apart  from  a  string 
of  principal  error  terms,  plus  a  few  other  negligible  errors.  Some  care  will  be  needed  to  bundle 
events  together  in  a  way  that  avoids  exponential  overcounts  of  these  error  terms.  Then  the  &-th 
error  term  will  turn  out  to  be  the  difference,  between  uniform  hashing  and  double  hashing,  in 
the  expected  number  of  look-ahead  restricted  Ai-item  aggregates  (or  more  precisely,  aggregates 
restricted  by  vacancy  calculations  determined  from  hashing  h-item  or  smaller  subtrees  from  D) 
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of  local  dependency  DAGs  rooted  by  xQn-  The  resulting  error  will  be  dominated  by  the  product 
of  0(^-)  and  the  &-th  coefficient  of  a  simple  generating  function  g  defined  from  these  structures. 

The  ensuing  equation  for  g  depends  on  the  amount  of  look-ahead  that  is  used  but  is  essen¬ 
tially  independent  of  n.  The  solution,  as  the  look-ahead  parametert  approaches  infinity,  turns 
out  to  have  a  radius  of  convergence  that  exceeds  1,  for  any  fixed  load  factor  a  <  1.  Hence,  for 
some  fixed  look-ahead  that  depends  on  a,  the  errors  sum  to  0(T)  and  decay  rapidly  in  k.  Thus 
our  approach  throws  the  question  of  asymptotic  optimality  entirely  onto  the  limit  properties  of 
a  simple  family  of  ordinary  differential  equations  that  govern  g.  While  our  use  of  converging  dif¬ 
ferential  equations  is  intended  to  be  self-contained,  a  comprehensive  introduction  to  the  subject 
can  be  found  in  [7]. 

The  resulting  isomorphism  implies  that  as  n  —>  oc,  there  is  a  limiting  probability  distribution 
on  the  (unembedded)  DAG  structures  encountered  by  the  collision  behavior  of  the  crn-th  item 
and  its  recursive  collision  descendents.  Moreover,  this  distribution  is  identical  for  generalized 
double  hashing  and  uniform  hashing  (c.f.  Subsection  3.2). 

2.  Superdependency  graphs 

The  thrust  of  our  asymptotically  exact  performance  proof  is  to  analyze  the  statistical  behavior 
of  double  hashing  on  aggregates  of  local  dependency  DAGs.  Consequently,  we  must  extend  the 
notion  of  a  dependency  set  to  these  larger  ensembles  of  data. 

Definition  5. 

•  The  superdependency  set  of  x,  sdep(x,D )  is  the  union  of  all  vertex  sets  belonging  to  local 
dependency  sets  of  x:  sdep(x,  D)  —  Uj)lc£fdep(x,  D'). 

•  The  superdependency  graph  of  x,  D ),  comprises  sdep(x ,  D)  plus  the  directed  edges 

that  occur  from  the  actual  collisions  when  sdep(x,  D)  is  hashed:  If  y  is  the  r-th  member  in 
the  adjacency  list  of  z,  then  the  first  r  probes  for  z  must  be  to  locations  already  occupied 

t  Actually,  the  amount  of  statistical  independence  that  is  needed  for  our  parameter  h  turns  out  to  be  a  fixed 
function  of  h  plus  O(logn),  rather  than  h  itself  (c.f.  Lemmas  6,  7  and  Theorem  3  of  [16] 
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by  members  of  sdep(x,  D),  and  y  must  actually  be  hashed  (from  the  sequence  D)  to  the 
r-th  probe  location  of  2, 

•  We  also  define  GSup{x)  to  be  the  set  of  superdependency  graphs  that  result  from  any  prefix 
of  the  actual  probe  sequence  for  x. 

•  The  vertex  x  will  be  called  the  root  of  Gsdep(x),  even  though  the  structure  may  contain 
other  vertices  (that  appear  earlier  in  D)  with  indegree  0.  Similarly,  we  shall  call  a  superde¬ 
pendency  graph  a  tree  if  it  is  a  tree  when  the  edges  are  viewed  as  being  undirected.  The 
outdegree  of  the  root  of  a  superdependency  graph  G  will  be  denoted  by  degr(G). 

Gsdep{xi  (and  all  graphs  in  Gsup(x)),  are  DAGs,  but  have  a  more  complex  structure  than 

the  dependency  graph  G(x.  D).  and  the  partial  dependency  graphs  of  x.  For  example,  suppose 
that  y  is  in  ads  true  dependency  set  dep(x,  D),  and  suppose  that  y  will  be  hashed  (from  D )  into 
a  location  l.  There  may  be  other  nodes  in  local  dependency  sets  dep(x ,  D ')  that  would  reside  in 
location  /  and  would  belong  to  the  dependency  set  of  X.  were  y  (and  perhaps  others)  removed 
from  D.  We  call  these  vertices  dummies.  Any  vertex  (other  than  root  x ),  which  has  indegree 
zero  in  the  DAG  is  a  dummy ,  since  no  key  in  the  superdependency  set  is  hashed  in  a  way  that 
depends  upon  its  presence.  More  generally,  a  dummy  is  a  vertex  that  belongs  to  some  local 
dependency  set  of  x.  but  which  will  actually  reside  in  a  later  probe  location  than  that  indicated 
by  the  local  dependency  set. 

Since  limited  randomness  forces  us  to  examine  superclependency  structures  that  are  not 
maximal,  we  will  also  use  these  notions  to  refer  to  a  local  superclependency  set,  sdep(x,  D1), 
which  is  the  union  of  the  local  dependency  sets  in  D'  c  D. 

When  analyzing  arbitrary  superclependency  DAGs,  we  will  need  to  use  a  canonical  traversal 
process,  which  extracts  a  tree-like  subset  of  edges,  so  that  the  resulting  subgraph  is  connected 
and  acyclic,  when  the  edges  are  viewed  as  being  undirected. 

Definition  6. 

Let  G  —  (V,E)  be  a  superclependency  DAG.  We  call  T  =  (V,Et)  an  omnidirected  spanning 
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tree  if  Ef  comprises  the  edges  selected  as  follows.  We  scan  and  process  the  vertices  V  in  order  of 
decreasing  index  in  D.  The  root  is  the  first  vertex  processed,  and  is  handled  differently  from  the 
rest.  Its  neighbors  in  G  are  immediately  discovered  and  are  forced  to  be  its  children  in  T.  For 
specificity,  a  child  receiving  multiple  probes  from  the  root  is  taken  to  be  connected  by  the  edge 
that  represents  the  highest  probe  number.  Thereafter,  we  initiate  a  standard  recursive  DFS 
from  the  root.  The  DFS  explores  the  probe  edges  exiting  a  vertex  in  order  of  decreasing  probe 
number.  Edges  taken  to  newly  discovered  vertices  are  entered  in  Et.  When  the  DFS  completes 
and  returns  to  its  origin,  the  scan  is  continued  to  the  next  vertex  that  is  not  yet  DFSed.  There 
is  one  final  modification  of  the  DFS.  Also  in  Et  is  the  very  first  cross  edge  encountered  during 
(some  recursive  level  of)  each  DFS  initiated  from  the  scan  of  a  newly  discovered  vertex,  except 
for  the  DFS  initiated  from  the  root. 

The  cross  edges  ensure  that  the  resulting  structure  is  connected,  when  the  edges  are  viewed  as 
being  undirected. 

Lemma  A  in  the  Appendix  (as  adapted  from  Lemma  2,  in  [16])  shows  that  a  u-vertex 
e-edge  superdependency  graph  has  the  same  local  probability  distribution  as  its  dependency 
counterpart.  In  particular,  let  G  be  a  superdependency  structure  with  v  vertices  and  v  —  1  edges. 
We  may  deduce  that  v  keys  hash  into  the  structure  defined  by  G  with  probability  ■ 

Moreover,  the  chances  are  only  0(v3n~v)  that  the  vertices  will  hash  into  a  superdepenclency 
graph  with  v  vertices  and  v  or  more  edges,  which  yields  G  as  its  omnidirectecl  spanning  tree.  The 
proof  of  Lemma  A  is  presented  in  the  Appendix,  so  that  the  basic  argument  can  be  repeatedly 
applied  without  elaboration. 

This  probability  distribution  suggests  that  hashing  statistics  might  be  quantified  from  the 
behavior  of  superdepenclency  DAGs  with  e  =  v-  \  edges.  But  before  this  notion  can  be  exploited, 
we  will  have  to  reduce  the  (expected)  number  of  local  superdependency  graphs  that  can  occur. 

A  way  to  control  this  expectation  is  by  extending  the  definitions  of  superdependency  graphs 
to  include  our  vacancy  criterion.  The  vacancy  criterion  requires  that  any  dummy  item  2  in  a 
local  superdepenclency  set  D' .  which  locally  hashes  into  a  location  /,  must  not  have  h  or  fewer 
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items  preceding  it  in  D ,  which  hash  into  a  local  structure  rooted  at  1.  Otherwise  we  recognize 
that  the  local  dependency  set  cannot  be  global,  since  1  known  to  be  already  occupied. 

The  following  definitions  incorporate  our  vacancy  criterion  into  superdependency  graphs. 

Definition  7. 

•  The  /2-superdependency  set  of  x,  sdep)h)(x,  D)  is  the  union  of  all  local  dependency  sets  of 
x ,  in  which  all  elements  find  their  locations  empty  according  to  vacancy  criterion 
which  is  evaluated  with  respect  to  all  of  D. 


sdep(h\x,D)  =  Uj),cD{dep(h\x:  D')}, 

where  a  local  /i-depenclency  set  S  =  dipd‘)(x.  I)').  is  defined  to  be  a  (possibly  empty) 
subsequence  S  c  D\  that  hashes  by  itself  into  a  local  dependency  graph  rooted  by  x,  and, 
at  the  insertion  time  of  the  s  e  S,  selects  the  apparent  location  1  where  the  following  hold. 

i)  Location  1  is  vacant  at  the  time  s  is  hashed  from  the  sequence  D' . 

ii)  Location  /  satisfies  the  vacancy  criterion  Aid1)  at  the  time  s  is  hashed  from  the 
sequence  D. 

•  The  h-superdependency  graph  of  x,  (x,D),  comprises  sdepd1)  (a; ,  D)  plus  the  edges  that 

occur  from  collisions  when  sdep^(x,  D)  is  hashed  by  itself. 

•  We  also  define  {x,D'\  the  superdependency  graph  of  x.  when  only  D'  is  hashed.  The 
vacancy  criterion  Add1)  however,  is  always  taken  with  respect  to  all  of  D. 

•  Finally,  we  define  the  set  Gsup{x,  D),  which  contains,  for  each  prefix  of  the  probe  sequence 
for  x ,  the  resulting  /i-superdependency  graph. 

When  the  full  vacancy  criterion  is  used  (i.e. ,  h  —  n),  of  course,  Xj  will  only  root  probej 
different  local  superdependency  graphs,  which  are  also  global.  While  the  number  of  graphs 
grows  considerably  when  weaker  criteria  are  used,  we  still  have  the  following  formulation  for 
any  h  (and  actually  any  vacancy  criterion). 
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E  \probet  ]  =  Prob{degr(G^lv{xi,D ))  >  k } 

k> 0  (2) 

where  degr(G^hJep(  xh  D))  is  the  degree  of  xt  in  the  graph. 

Our  counting  of  superdependency  sets  requires  a  little  explanation,  ft  is  helpful,  for  sake  of 
exposition,  to  limit  the  discussion  to  omnidirected  spanning  trees  of  superdependency  graphs, 
which  have  one  fewer  edges  than  nodes,  although  there  may  be  many  dummy  roots.  Imagine, 
for  the  moment,  that  all  edges  are  undirected.  Take  the  root  of  such  a  structure  to  be  the 
vertex  with  highest  index,  call  it  x ,  and  impose  a  redirection  of  the  edges  based  on  search  from 
x ,  so  that  the  structure  is  now  an  actual  tree.  A  dependency  tree  Rx  can  be  prescribed  by 
listing,  for  each  vertex  v  in  the  graph  Rx ,  the  set  of  descendent  vertices  reachable  from  the  first 
probe  for  u,  from  the  second, . . . ,  up  to  the  last  probe  that  collided  with  items  in  the  tree.  This 
formulation  was  used  in  Theorem  1,  of  [16],  to  attain  a  recursive  count  of  the  expected  number 
of  dependency  graphs  rooted  by  x. 

For  our  redirected  superclependency  tree,  however,  such  a  representation  Rx ,  is  ambiguous: 
the  ith  edge  from  v  is  either  to  the  vertex  w  with  highest  index  in  the  ith  set,  or,  if  iv  is  a 
dummy,  to  last(w ),  which  is  the  item  with  highest  index  in  w7 s  last  subset,  or,  if  last(iv)  is 
also  a  dummy,  to  last2(w),  and  so  forth.  In  a  dependency  prescription  RX:  there  is  no  such 
ambiguity;  the  probe  edge  must  be  directed  to  the  vertex  with  the  highest  index  in  the  set.  But 
for  superdependency  trees,  it  follows  that  up  to  negligible  terms,  2'*’  x  P(RX)  is  the  probability 
that  a  specified  set  of  keys  Dx  in  Rx  hash  locally  into  some  superclependency  structure  GX 
whose  undirected  structure  matches  the  tree  Rx,  where  P(RX )  is  the  probability  that  Dx  hashes 
as  a  dependency  graph  into  the  structure  prescribed  by  RX ,  and  6  is  the  number  of  nodes  that 
can  legally  appear  as  dummies,  according  to  the  vacancy  criterion  as  applied  to  Gx  but  without 
regard  for  the  hashing  on  D  -  Dx . 

As  a  very  weak  consequence  of  the  vacancy  criterion,  a  node  z  in  a  h -superdependency  graph 
G(h)  (x.  D)  cannot  be  a  dummy  unless  z  has  more  than  h  descendents  in  G{h)  (xi  D). 

saep\  ’  '  J  saepx  ’  ' 
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Definition  8. 

•  Let  this  weak  consequence  be  named  the  polyexclusion  principle. 

We  note  that  a  stronger  consequence  can  also  be  deduced:  a  hash  key  z  cannot  be  a  dummy 
that  is  apparently  (successfully)  hashed  into  its  /-th  probe  location  if  its  Z-th  probe  turns  out 
to  store  the  root  of  any  local  dependency  tree  of  j  <  h  items  and  whose  elements  belong  to 
sdep(z ,  D ). 

We  shall  begin  by  recursively  counting  tree  structures.  This  counting  is  more  easily  done 
by  counting  dependency  representations  RX:  and  including  a  multiplier  for  each  node  that  can 
be  interpreted  as  a  dummy. 

Given  a  representation  Rx ,  a  top-down  embedding  of  the  tree  has  two  possible  constructions 
for  each  node  that  might  be  a  dummy:  one  which  can  occur  to  a  locally  vacant  location  with 
vacancy  probability  and  one  that  can  occur  to  a  location  contain  that  already  stores  the 
root  of  a  DAG  that  has  at  least  h  + 1  keys.  To  get  all  partial  superdependency  graphs,  we  again 
count  structures  where  the  root  is  not  yet  embedded. 

Theorem  1.  Let  a  <  1  and  r  be  fixed  constants.  Let  c,  d  and  h  be  sufficiently  large  and  fixed. 
Let  if)  =  dlogn.  Let  s(k,an)  be  the  recursively  defined  overestimate,  as  quantified  below,  of  the 
expected  number  of  local  h-superdependency  trees  that  are  rooted  by  xan,  contain  k  items,  and 
satisfy  vacancy  criterion  AflG,  in  XJH,  DH ,  and  DH^.  Then 

1)  The  expected  number  of  such  trees  that  have  clog??  nodes  or  less  is  bounded: 

T,  s(k1  an)  —  0(1). 

k<c  log  n 

2)  The  expected  number  of  such  trees  with  clog??  nodes  or  less  is  bounded  even  when  a  tree 
of  k  nodes  is  weighted  by  kr\  J2k<c\ogn  krs(k,an )  =  0(1). 

3)  The  expected  number  of  such  trees  having  a  node  count  in  the  interval  [clog??,  2c log  ??]  is 
0(??-2)  even  when  a  tree  of  k  nodes  is  weighted  by  ??r:  J2dogn<k<2clogn  s{^i  an )  =  0(??_2_r). 

Proof:  Let  s(k,a )  be  the  sum,  over  all  of  the  ( k  -  l)-item  subsequences  in  ;tq, . ,xu  ]. 
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of  the  (over  )estimated  probability  that  the  tuple  plus  xa  hashes  locally  into  a  super  dependency 
tree  whose  non-dummy  nodes  other  than  xa  find  vacant  locations,  according  to  Nodes 

that  can  be  dummies  yield  alternative  embeddings  (superdependency  structures),  depending 
upon  whether  they  are  interpreted  as  dummies  or  not.  Dummy  interpretations  will  be  given 
an  embedding  probability  of  1  to  avoid  multiplying  the  dependent  estimates  that  a  given  hash 
location  satisfies  the  vacancy  criterion  M^)  at  different  (locally  successful  globally  unsuccessful) 
insertion  times.  We  admit  a  superset  of  candidate  keys  as  potential  dummies  by  applying  the 
polyexclusion  principle  as  an  overestimate  of  . 

Let  //  be  the  constant  1.  (We  shall  later  have  cause  to  solve  the  same  system  with  /./  =  3.) 
By  recurring  on  the  subtree  rooted  at  the  most  recent  probe  location  of  xa.  we  get: 

5(1, a)  =  1, 

s{k,  a)  <  J2  sU^b)fi  +  n  — s(k-j,a)+  s(j,b)q  s(k  -  j,  a),  k>  1.  (2) 

h<j<k- l  1  l <j<h  1 

0<i<a-l  0<6<a-l 

We  replace  the  quotient  by  ^  ,  for  j  <  h ,  because  the  root  of  a  subtree  of  size 

h  or  less  cannot  be  interpreted  to  be  a  local  dummy,  according  to  the  polyexclusion  principle. 
Here,  W  denotes  an  upper  bound  for  the  probability  n_Q^  of  probing  a  given  location.  We 
now  suppress  the  subscript  and  use  the  simpler  value  n.  which  can  only  change  the  resulting  a 
fc’th  coefficient  by  a  factor  of  1  +  ,  which  turns  out  to  be  negligible  in  all  cases. 

We  may  overestimate  5  by  solving  the  system 

t(l,a)  =  1, 

t{k,a)  =  *(.?>) +  ^  ^  Kb)t^  _  j,  a),  fc  >  1. 

h<j<k- 1  l<j<min(/j,fc-l) 

0<i<a-l  0<i<a-l 

Set  T(k,a)  =  t(k,a)q(h)(a)1  <r(k,a )  =  t(k,a)(n  +  q(h)(a)}1  and  T(j,a )  =  Eo<i<«-i  tHr1-  The 
following  equivalent  system  is  attained. 
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r(l,a) 

^  O',  <0 

r{k, a) 


E  T(i-6). 

0<6<a-l 

Y  T0»r(A;-i,a),  *>!> 

l<j<A)-l 


rQ,  0 

no<Ra-i^W 


ro,«) 

<T(l,a)  =  q(h)(a)  +  fii 


(3) 


«50»  =  ^  Y  a0», 

0<6<a-l 

(r(k,a)  =  ^  5(j,a)a(A:- j,a)+  ^  T(j,  a)a(A  -  j,  a),  k>  1. 

h<j<k- 1  l<jr<min(7i,A-l) 

Let  q  be  a  continuous  monotone  rescaled  interpolant  of  </W,  that  is  defined  on  the  domain 
[0, 1],  In  particular,  set  q(j3)  =  1,  for  3  <  l/n,  and  q(j3)  —  q^h\j3n  -  1),  for  (3  =  i/n.  We  will  use 
the  customary  notation  fp  =  for  any  function  f  of  f3. 

Then  the  solutions  T,  r,  S,  a ,  and  T  can  be  tightly  overestimated  by  T,  Tg,  <5,  5g,  and  T  in 
the  system: 


t(M)  =  o, 


1< j<*— 1 


S(k,  0)  =  0, 

£/?(!,  /?)  —  </(/'  '(^)  +  /': 


da, 


k>  1, 


(4) 

(5) 


Sp(kJ)=  Y  S(j,f3)s^k-j,/3)+  Y  T(j,  j3)Sp(k  -  j,  (3),  k>  1, 

The  domains  of  X  and  X1  are  [l5oo  ]  X  (0,1).  Here  <S(j,/3),  t(j,/3)  and  T(j,(3),  are  tight 
overestimates  of  S(j,nf3 ),  T(j,nf3 )  and  T(j,n{3),  respectively.  It  is  not  difficult  to  see  that 
<S(A;,  ySre)  <  S(k,  3),  and  cr(k,  f3n)  <  Sp(k,  (3). 

For  convenience,  we  rewrite  the  system  for  S  as: 


S{k,  0)  =  0, 
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Sp(k,/3)  =  0,  k<  0, 

^/3(l5/3)  =  V  +  ?(/?}, 

X  V  X  T(j>l3)Sp(k-j,@),  k>  1. 

l <j<ft 


and  put 


“’(*:£)  =  X  5(M)3'A'  j, 

0<A 


0</fc 

Define  the  truncated  function 

GW(x,P)=  Y.  T(k,p)xk-\ 
0<k<h 

and  define 


G(x,l3)  =  YT<k,l3)x>‘-1. 

0<k 

Differentiating  (7)  and  expanding  via  (6)  and  (9)  gives 

tc(a;,  0)  =  0, 

Wp(x,  fi)  =  xiv(x,  fjju'jix.  d)  -  fixG W  (z,  8}wp(x,  (3)  +  fi  +  <?(/?). 
Manipulating  (11)  gives, 

Wo(r  r\  = _ h  +  q{P) _ 

1  +  gxG(h)(x,  (3)  —  xw(x:  f3)  ’ 
where  we  wish  to  determine  when  w(l,/3)  is  finite. 

Differentiating  (8)  via  (4)  gives 

=  0, 

9pfa.  fi)  =  XS  (x,  J3)gp(x,  (3)  +  q((3). 

Integrating  (13)  as  is,  and  applying  the  quadratic  formula  gives, 


where 


Q(fi)  =  /  9(7 )d7, 

Jo 
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and  hence 

9p{*,P)  =  /  =?  =• 

VI -2 xQiP) 

In  view  of  (5)  and  (8),  we  see  that  (t(x.Ji)  as  dehned  in  (10)  can  also  be  expressed  as: 

rdk 

Finally,  let 

C(M)  =  /0  9b&,  b  f  ^^b'db 

=  g{x,P)  +  /iG(x,p). 

Then  (17)  gives 

and  expanding  with  (13)  gives 

=  xg(x,  Jf )gp{x,  +  /'  +  q{P), 

whence  applying  (19)  gives 

=  xg{x,  8)(p(x.  fi)  -  //  -  q(8 ), 
so  that  (18)  can  be  used  to  yield 

=  ®(C0%  8)  -  8))(p(x,  8)  -  //  -  q((3 ), 

whereupon  collecting  the  (p  terms  results  with 

i  +  tlXG{x,p)-x({x,py 

From  (19)  and  (15),  we  see  that  (p  can  also  be  expressed  as: 

CpM)  =  {/l  +  m)  • 

VI -  2 xQ(/3) 


(15) 

(16) 

(17) 

(18) 

(19) 


(20) 

(21) 


Since  our  interpolated  q([3 )  converges  uniformly  to  1  -ft  on  [0,  a]  as  h  (and  n)  —>  co,  Q(f3 )  as 
dehned  in  (14)  converges  uniformly  to  [3  -  /?2/2,  and  (p(x,j3)  as  expressed  in  (21)  will  converge 
uniformly  to  p2  on>  saJ7  domain  T  =  [0,  '2+(1~a'>  ]  x  [0,  a]. 

Now  gp(x,8)  as  expressed  in  (15)  will  converge  uniformly  on  the  same  domain  T,  to 
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,  ^  p  ,  and  G[x,/3)  will  converge  uniformly  and  coefficient  wise  on  T,  (identity  (16)). 

’Sj  1  2  x  ~ j—  x 

But  Gh  is  the  truncated  function  of  G.  Hence  |G(x,/3)  -  Gh(x,f3)\  is  uniformly  bounded  by  eh, 
where  >  0  as  h  (and  n)  —*  oo.  The  point  is  that  G  is  analytic  in  x,  and  its  coefficients  will  be 
uniformly  exponentially  decreasing  once  h  and  ??  exceed  some  fixed  constants. 

Now  consider  the  differential  equations  defined  by  (12)  and  (20).  We  see  that  w.j  will 
converge  uniformly  with  (g  to  (lt+1-T)  as  /;  ^  and  q(h)  anc[  G(x,/3)  converge  (as  n 
also  goes  to  oo).  In  particular,  for  some  fixed  sufficiently  large  h.  Wjg  will  be  bounded  on 
T.  But  then  the  analyticity  of  for  |x|  <  1  +  2^)1  guarantees  that  its  coefficients 

Sp(k+1,  f3)  —  0(s(k,  /3n)([i+q(f3n ))  are  bounded  by  the  exponentially  decaying  0((1  +  ^1~2°^  )  k). 
The  size  of  h  and  the  decay  rate  are  independent  of  n  because  the  equations  are. 

In  view  of  the  uniform  convergence,  we  conclude  that: 


1)  For  large  enough  constant  A,  the  expected  number  of  local  superdependency  trees  rooted 
by  xan  that  have  clog??  nodes  or  less  is  bounded  by  0(wjg(l,  8)  |^_a). 

2)  For  large  enough  constant  /?,  the  expected  number  of  local  superdependency  trees  rooted 
by  xan  that  have  clog??  nodes  or  less  when  a  tree  of  k  nodes  is  weighted  by  kr ,  is  bounded 
by  0{(x£yivp(x,8)  \p=ct}X=1). 


3) 


For  large  enough  constant  h,  the  expected  number  of  local  superdepenclency  trees  rooted  by 
Xan  and  having  a  node  count  in  the  interval  [clog  ??,  2c log  ??]  is  0(cn~i~r  log  ??)  for  suitable 


‘-oigE#)- 

The  bound  on  ij)  need  only  be  large  enough  to  ensure  that  the  events  in  a  sample  of  2c log?? 
items  behave  as  if  fully  independent,  along  with  (proportionally  sized)  additional  samples,  to 
ensure  that  the  vacancy  estimator  also  has  a  statistical  behavior  equivalent  to  the  fully  inde¬ 
pendent  case  (c.f.  Theorem  3  of  [16]).  1 


Corollary  1.  Let,  as  in  Theorem  1,  a  <  1,  and  r  be  fixed.  Let  c  and  d  be  sufficiently  large 
and  fixed  with  ij.’  —  dlogn.  Then  for  sufficiently  large  constant  /?,  the  probability,  in  UH ,  DH , 
and  DH:lp,  that  xan  roots  a  local  superdependency  tree  that  satisfies  Aid1)  ancl  contains  2c log?? 
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nodes  or  more  is  0(n  r  1). 

Proof:  Such  a  tree  must  contain  a  local  superdependency  (sub)tree  of  £  nodes  for  some 
£  e  [clog  n,  2c log  n\,  and  root  in  D.  From  Theorem  1,  the  expected  number  of  such  trees  with 
specific  root  Xp,  is  0(n~r~2).  Summing  this  probability  over  all  possible  roots  Xp  gives  a  bound 
of  0(n~r~1).  | 

Theorem  2.  Let,  as  in  Theorem  1,  a  <  1,  and  r  be  fixed.  Let  c  and  d  be  sufficiently  large 
and  fixed  with  if)  —  dlogn.  Let  N%  be  the  number  of  local  superdependency  sets  that  occur, 
are  rooted  by  xan-  contain  k  items,  satisfy  the  vacancy  criterion  M(h\  and  are  not  trees. 
Then  for  h  sufficiently  large  but  fixed,  in  V H ,  DH,  and  DH E[5Z^Acc leg «  hr  Nk]  =  0(1),  and 

Sc  log  n<k<2c  log  n  0(n  ). 

Proof:  We  follow  the  proof  schema  of  Theorem  1.  Let  v(k,a)  be  the  sum,  over  all  of  the 
subsequences  of  k  —  1  items  in  x\, . . .  ,xa_i,  of  the  (over ) estimated  probability  that  the  tuple 
plus  xa  hashes  locally  into  a  superdependency  DAG  whose  non-dummy  nodes  other  than  xa 
find  vacant  locations,  according  to  A/  ■  f‘> .  ft  is  convenient  to  count  the  number  of  tree-like 
structures  plus  extra-edge  DAGs,  and  subtract  the  tree  counts.  We  can  justify  subtracting  the 
tree  overcounts  by  using  a  recurrence  that  introduces  a  tag  variable  z  for  DAGs  that  have  extra 
edges,  and  take  (g(z)  -  g( 0))  |  j  as  our  overestimate  for  the  expected  number  of  DAGs  that 
are  not  trees. 

The  counting  will  again  be  over  the  Rx  prescriptions  of  dependency  trees  with  weightings 
to  account  for  all  possible  dummies  that  permit  restructuring  into  different  superclepenclency 
trees,  with  additional  factors  to  account  for  non-tree  edges  as  in  Lemma  A.  Let  v(k,a )  be  an 
overestimate  of  the  expected  number  of  h-superdependency  DAGs  rooted  at  an  unembedded  xa, 
which  have  k  nodes.  We  again  recur  on  the  subtree  rooted  at  the  most  recent  probe  location 
of  xa,  and  again  appeal  to  the  polyexclusion  principle  to  reduce  the  factor  ^  to  q  J—  , 

when  applied  to  embed  a  v(j,  b),  where  j  <  h.  But  there  is  a  catch.  If  the  subDAG  rooted  by 
Xf,  has  a  DAG  probe  edge  that  points  outside  the  subDAG,  then  the  resulting  structure  may  be 
too  large  for  the  vacancy  criterion  to  declare  the  apparent  location  of  root  xj,  as  being  occupied 
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by  time  b.  Following  a  rather  pessimistic  bent,  we  restore  the  factor  to  be  - — qn  1  ■  ,  in  this 
case.  This  restoration  is  done  by  applying  it  selectively  to  the  terms  of  the  generating  function 
that  carry  tag  factors  of  2.  This  can  be  formulated  as  the  factor,  q{'  t,ab=i-ei’Bb=o) ^  w}lere 

evalz=x  evaluates  the  factors  to  its  right  with  x  substituted  for  each  appearance  of  2.  This 
prescription  gives  the  following. 

«(!,<*)  =  b 

»(M=  H  „(t  _  j,  W  ± -  “'<=»>  (1  +  zO(k>/n)frti,  b) 

1  <;,<h 
0<b<a.-l 

+  J2  v{k~ha)—  +  1(1  +zO(k3/7i))v(j,b). 
h<j< k-1 
0<i<a-l 

The  probability  that  root  x j  sustains  a  (tagged)  DAG  probe  to  one  of  the  k  nodes  is,  in  our 
models  DH  and  DH bounded  by  0(k3 /n).  The  expected  number  of  non-tree  DAG  formations, 
for  our  hashing  schemes,  is  derived  in  Lemma  A,  which  uses  DFS  traversal  to  attain  a  canonical 
tree  from  the  DAG,  and  estimates  the  number  of  ways  additional  probes  could  cause  a  DAG 
that  is  not  a  tree.  The  0(h3/n)  factor  accounts  for  the  non-tree  DAG  probing  by  the  root 
which  can  have  one  extra  probe  to  a  member  of  its  local  superdependency  set,  or  more,  in  which 
case  xi  might  not  be  placed  in  a  random  location,  due  to  the  limited  independence  for  the  probe 
sequence  of  an  individual  hash  key.  The  tag  variable  2  records  the  fact  that  we  are  counting 
structures  that  are  not  trees. 

The  system  can  be  crudely  bounded  as  follows.  Let  w  solve 

«’(l,o)  =  1, 

w(k,a)=  J2  w(k~J,a)q(  b)+  J2  Wik-Jia)q{  }^  +  lwUib)- 

h<j<k- 1 

0<6<a-l  0<6<a-l 

Then  it  is  not  difficult  to  see  that 

.  (  “(C“)(l  +  z0(2))'-l(l  +  i-Odjjlj)'-1)),  for  1  <  h, 

■  "a)<  (^,o)(1  +  20(1))'-1(1  +  J0((52t)'-))'-i,  for  Oh. 

To  establish  the  bound,  it  suffices  to  multiply  the  recurrence  for  w(l,  a)  by 

(  (l+zO(£))<-1(l  +  *0((5gJ)«)),  for  f  <  k, 

l  (1  +20(£))'-1(l  +  for  l  >  h, 
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and  appeal  to  induction.  We  leave  the  intermediate  minutiae  to  the  reader. 

k(-K)h+k4 

Thus  the  tagged  contribution  v(k,a)  |  j  —  v(k,a)  |  0  is  bounded  by  0( — - )s(k1a), 

where  s(k,a)  is  as  in  Theorem  1.  The  conclusions  follow.  | 

Corollary  2.  Let,  as  in  Theorem  1,  a  <  1,  and  r  be  fixed.  Let  c,  d  and  h  be  sufficiently  large 
and  fixed  with  ip  =  c/ log  n.  Let  T  be  a  subsequence  of  the  indices  (1,2,...,  cm),  with  |T|  <  clog n, 
and  Tjj>|  =  an ,  so  that  Dj  is  a  subsequence  in  D  —  (j.q,  j;2,  . .  ,.ran),  and  (DT) ^  =  xan.  Let  G 
be  a  local  superdependency  tree  structure  with  |T|  nodes,  and  eT  G  be  the  event  that  DT  hashes 
locally  into  G,  and  the  hashing  satisfies  the  vacancy  criterion  M W'.  Let  XTG  be  the  indicator 
function  for  eyG,  so  that  X  is  zero  if  the  event  does  not  occur,  and  one  if  it  does. 

1)  Let  Nk  be  the  expectation  of  the  product  of  XT  G  and  the  number  of  local  h-superdependency 
sets  A,  where  S  c  D  -  DTl  |5|  <  clog?z,  S  hashes  into  a  local  superdependency  DAG  that 
satisfies  M^h\  that  is  rooted  by  some  item  in  D  —  D y,  and  is  not  a  tree  structure,  S 
hashes  locally  into  an  embedded  structure  that  intersects  the  local  hashed  embedding  of 

and  no  proper  subset  of  S  exhibits  these  properties,  so  that  each  S  is  minimal.  Then 
Ek<dognNkkr  =  0(^)Prob{eTjG}. 

2)  Let  Zk  be  the  expectation  of  the  product  of  XjG  and  the  number  of  local  superdependency 
DAGs  that  comprise  k  <  clog?7  items  solely  from  D  -  Dy,  satisfy  JVfW,  are  rooted  by 
any  item  in  D  -  DT:  and  locally  hash  into  a  structure  that  intersects  the  locally  hashed 
embedding  of  Dj*  in  at  least  two  probes,  and  do  not  have  a  smaller  subset  of  keys  that 
hashes  into  a  local  h-superdependency  DAG  that  intersects  Dy  in  two  or  more  probes. 
Then  Ek<c\ognZkkr  =  0(^)Prob{eT)G}. 

3)  Let  Z  be  the  expectation  of  the  product  of  Xj~  G  and  the  number  of  local  superclepenclency 

DAGs  that  comprise  between  clog??  and  2c log  n  items  in  D-DTl  satisfy  and  are  root¬ 

ed  by  any  item  in  D  -D^.  Then  for  sufficiently  large  constant  h  and  c,  Z  <  ??_rProb{ey  G}. 

Proof:  These  statements  follow  immediately  from  Theorems  1  and  2,  where  the  analog  of 

these  claims  without  a  T  component  are  established,  from  the  multiplicativity  of  the  vacancy 


22 


Closed  hashing  is  computable  and  optimally  randomizable  with  universal  hash  functions 


estimator  and  from  Lemma  A,  which  quantifies  the  hashing  statistics  of  subsets  of  ip  items  or 
less,  for  DH For  example,  if  D~  is  a  candidate  subsequence  of  k  items  in  D  -  it  can 
intersect  the  local  hashing  of  DTl  in  DH ^  with  a  probability  bounded  below  0(  |T|2fc3)/n2,  for 
case  2.  We  count  minimally  sized  structures  that  intersect  DT  to  avoid  the  degenerate  cases 
where  a  loss  of  randomness  occurs,  and  so  many  intersections  occur  that  the  vacancy  criterion 
is  affected,  since  it  is  not  multiplicative  when  evaluated  at  one  location  for  two  different  times. 
Similarly,  the  intersection  and  non-tree  DAG  requirements  introduce  a  comparable  factor  for 
Case  1.  These  factors  multiply  the  count  of  local  fc-item  h-superdependency  DAGs  that  occur. 
This  count  has  an  extra  factor  of  0{n)  when  compared  to  the  expected  number  of  such  structures 
rooted  by  xan,  since  the  root  is  not  restricted  to  be  a  specific  key. 

The  size  bound  for  ip  should  be  doubled,  since  the  statistics  are  subsets  of  data  that  are 
twice  as  large.  | 

Corollary  2  helps  confirm  that  superclependency  trees  account  for  most  of  the  hashing  s- 
tatistics.  This  observation  can  be  formalized  as  follows. 

Corollary  3.  For  \T\  —  O(log?z),  large  enough  constant  h,  and  ip  —  O(log??),  the  following  two 
events  are  asymptotically  equivalent:  (DT  hashes  into  a  local  h-superdependency  tree  rooted 
by  xan ),  and  the  event  (DT  hashes  into  a  local  h-superdependency  tree  rooted  by  xan,  and  the 
h-superdependency  graph  of  xan  is  a  tree). 

Proof:  If  Dt  hashes  into  a  local  A-superdepenclency  tree  and  sdep^{{DT ),y|,  D)  is  not 
a  tree,  then  one  of  the  three  cases  in  Corollary  2  must  be  applicable,  whence  the  conclusion 
follows,  y 

We  now  need  an  inclusion-exclusion  formula  for  trees.  It  will  be  convenient  to  change  our 
counting  method  for  superclependency  trees.  Our  recurrence  equations  counted  them  as  true 
trees,  with  possible  dummy  nodes  that  actually  bundled  different  superclependency  structures 
together  with  a  single  tree  representative,  and  with  multiplicative  factors  designed  to  give  a 
weighting  that  overcounted  the  expected  number  of  such  bundled  structures. 

The  errors  that  result  from  our  inclusion-exclusion  formula  will  be  especially  important  to 
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bound. 

Corollary  4.  Let  r  fixed,  and  let  c,  d.  and  h  be  sufficiently  large  and  fixed.  Let  xf  be  the 
indicator  function  that  is  1  when  xt  roots  a  local  h-superdependency  tree  with  a  node  count 
that  is  in  the  interval  [clog  n,  p log  n]  and  the  globally  defined  superdependency  DAG  rooted  by 
Xf  is  a  tree.  Let  Nt  be  the  number  of  local  ft-superdependency  trees  with  root  x%  that  have 
clog?z  nodes  or  less.  Let  ip  —  dlogn.  Then  in  UH,  DH ,  and  DH E[JVOTC  x  A’^]  =  0{n~r). 

Proof:  We  again  appeal  to  the  recursive  formulation  of  Theorem  1,  and  the  earlier  rep¬ 
resentation  of  bundled  superdependency  DAGs  with  dummies.  Let  s(k,a )  be  the  weighted 
sum,  over  all  of  the  (k  -  1  )-item  subsequences  in  aq, . . . ,  xa  \ .  of  the  (over)estimated  probabil¬ 
ity  that  the  tuple  plus  xa  hashes  locally  into  an  h-superdependency  tree.  The  weighting,  for 
each  h-superdependency  tree,  is  the  number  of  local  h-superdependency  subtrees  (with  root  xa ) 
contained  by  the  tuple. 

Let  G  be  a  local  h-superdependency  tree,  and  suppose  that  x  is  a  dummy  node  in  G.  Then  we 
may  use  x  to  construct  an  h-superdependency  subtree  where  x  is  mistakenly  treated  as  residing 
in  an  unsuccessful  probe  location  of  its  probing  “parent”  and  eliminate  earlier  contenders  for  the 
location,  or  we  may  eliminate  x  and  its  subtree,  and  use  some  other  dummy  or  the  true  member 
of  the  dependency  set  as  the  presumed  resident  of  the  probe  location,  or  view  £  as  a  genuine 
non-dummy,  which  gives  yet  another  embedding  structure  where  ad's  parent  probes  a  (possibly) 
actual  embedding  location  for  x,  rather  than  ads  (locally  determined)  last  unsuccessful  probe 
location,  or  construct  the  h-superdependency  tree  that  recognizes  a;  as  a  dummy.  These  choices 
give  four  possibilities,  which  are  overcounted  by  setting  //  =  3  in  the  recurrence  below.  The 
overcount  is  severe,  since  only  one  of  the  possibilities  is  multiplied  by  the  vacancy  criterion  q(b), 
and  eliminating  x  eliminates  a  subtree  with  its  additional  interpretations  (which  this  counting 
procedure  fails  to  do. ) 
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By  recurring  on  the  subtree  rooted  at  the  most  recent  probe  location  of  xa,  we  get: 

•s(l,  a)  =  1, 

s{k,a)<  sU^b)fi  +  n  —s{k~j,a)+  sUib)q( sik  ~  jia),  k>1 , 

h<j<k- 1  1  1  <]<h  1 

0<6<a-l  0<6<a-l 

where  /t  =  3.  The  analysis  of  this  system  is  the  same  as  in  Theorem  1.  The  fixed  values  for  h  and 
d  will  be  larger  because  the  larger  value  of  fi  will  slow  down  the  convergence.  The  convergence, 
of  course,  is  still  to  the  system  with  /t  =  0. 

We  now  fix  p  —  3c.  For  this  range,  we  see  that  in  the  limit  ( h  —  co) 

E  [Nanxlcn}<  s(Jc,an), 

clog  n<k< 3c  log  n 

=  0(n-r),  (22) 


which  follows  from  an  analysis  comparable  to  that  used  in  Corollary  1.  Again  the  radius  of 
convergence  guarantees  that  the  exponential  decay  and  the  resulting  boundedness  will  occur  for 
sufficiently  large  constant  h  and  c. 

For  larger  (>.  we  reason  as  follows.  Let  ^  be  the  event  that  Dj  and  Dj ^  hash  into  local 
A-superdependency  trees.  There  is  no  restriction  on  the  identity  of  the  roots.  Suppose  that  xan 
roots  an  h-superdependency  tree  T  of  3c  log  keys  or  more,  and  let  T\  be  an  h-superdependency 
tree  of  clog?r  keys  or  less  that  is  rooted  at  xan-  We  may  extract  from  T  an  h-superdependency 
tree  T2  with  a  vertex  count  between  e log n  and  2c log n,  but  which  might  not  be  rooted  at  xQn. 
There  are  two  cases:  the  trees  are  disjoint,  or  they  intersect.  If  they  intersect,  the  resulting 
union  is  a  tree  than  can  occur  with  a  probability  comparable  to  that  of  an  A -superclepenclency 
tree  of  the  same  shape.  There  is  the  minor  difference  that  the  root  of  T2  need  not  be  successfully 
embedded.  We  account  for  this  statistical  variation  by  including  an  extra  factor  of  n. 
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In  view  of  this,  we  see  that 

E[Nan  x  A’°rn]  <  Probfe^  y2)  +  nE[Nan  x  X%cn\ 

i  ■  .'ihcf: .. <>>.] 

l<c  log  n , 

c  log  n  <  \T<2 1  < 2c  log  n 

<  ^2  Prob{eri  y2}  +  0(??_r+1)  according  to  (22), 

Ti,T2c[l..o;n] 

1^1  i<cl°gn)  (■^l)|T-|_  |= 
c  log  n  <  IT2 1  <  2  c  log  n 
TlnT2=« 

<  E[JVa«]  ^2  Prob{eTT}  +  0(n~r), 

T c[l  ..an] 
c  log  n <  \T  | <  2c  log  n 

=  0(//  1 ),  since  E[iV^]  =  0(1),  and  because  of  the  reasoning  in  Corollary  1. 

As  r  is  arbitrary,  the  result  follows.  1 

We  now  use  a  different  representation,  which  gives  a  one-to-one  mapping  between  superde¬ 
pendency  trees  and  their  representations.  While  the  following  representation  is  tailored  to  match 
the  baroque  features  of  superdependency  sets,  the  combinatorial  lemmas  apply  equally  well  to 
more  general  structures. 


Definition  9. 


•  Let  D  =  (aq,  . . . ,  xan)  be  a  sequence  of  atoms.  We  say  that  T  is  an  AND_0R  tree  over 
D  if  T  is  a  singleton  AND  node,  or  T  is  finite  and  the  natural  parse  tree  for 

k  hj 

T  =  A  V  T,r 

f— 1  7—1 

where  each  T, j  is  an  AND_0R  tree  over  D.  Each  AND  node  of  T  stores  an  atom  from  D , 
but  the  OR’s  do  not,  and  no  atom  appears  in  more  than  one  AND  node.  The  clescendent 
atoms  of  an  atom  y  eT  precede  y  in  D.  The  adjacency  list  of  each  AND  node  is  ordered. 
The  OR  node  children  only  bear  the  implicit  ordering  induced  by  the  contents  of  their  AND 
roots. 

It  is  easy  to  see  that  this  description  gives  a  faithful  representation  for  superdependency 
trees:  an  AND  node  has  edges  representing  the  unsuccessful  probes  by  its  resident  atom.  The 
OR  nodes  point  to  children  that  all  collide  at  a  common  location.  All  but  the  earliest  child  of  an 
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OR  node  are  dummies.  Some  AND_OR  trees  will  represent  superdependency  trees  that  violate 
the  vacancy  criterion;  this  will  cause  no  difficulty,  since  we  shall  not  use  them. 

Definition  10. 

•  Let  T  be  an  AND_OR  tree.  Then  S  is  an  implicant  subtree  of  T  if  S  is  an  AND_OR  tree, 
has  the  same  root  as  T,  its  edges  and  vertices  are  contained  in  T,  and  each  AND  node  in  S 
has  the  same  outclegree  as  in  T .  ft  is  easy  to  see  that  such  an  S  is  a  local  superdepenclency 
tree,  and  vice  versa. 

•  Let  S  c  T  denote  the  property  that  S  is  an  implicant  subtree  of  T. 

By  definition,  T  cT.  ft  is  also  worth  remarking  that  if  T  is  an  AND_OR  tree  that  satisfies  our 
vacancy  criterion,  then  each  implicant  subtree  S  c  T  also  satisfies  the  requirement,  since  true 
vacancy  is  defined  with  respect  to  the  atoms  in  D. 

Definition  11. 

•  Let  T  be  an  AND_OR  tree.  Let  R  be  the  set  of  OR  nodes  in  T.  The  degree  of  freedom  of 
T  fr(T )  is  defined  to  be: 

y~]  (outdegree(v)  -  1 ). 
veR 

•  Let  the  node  count  of  T  be  the  number  of  atoms  from  D  that  reside  in  T. 

Lemma  1.  Let  T  be  an  AND_OR  tree.  Let  be  the  indicator  function  that  equals  one  if  the 
atoms  named  in  T  hash  into  a  structure  that  satisfies  the  AND_OR  structure  stated  by  T,  and 
in  a  way  that  satisfies  and  equals  zero  otherwise.  Then 

y;  (-l)-M5)  —  1?  and  hence  ^  (_f)/J'(5)^5,  —  eval(T ), 

ScT  ScT 

where  eval(T)  is  the  Boolean  evaluation  of  the  logic  statement  expressed  by  T  about  the  probing 
behavior  of  its  constituent  atoms. 

Proof:  Consider  the  first  formulation.  Its  proof  is  by  structural  induction  over  T.  The 
formula  holds  if  T  is  a  single  node.  If  the  root  AND  has  at  r  >  2  children,  then  we  simply  note 
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that  £(-i)ws)  =  n  £(-1 V  where  Tj  comprises  the  root  of  T  and  the  j-th  OR  child 
5c  T  j- 1  SCIj 

as  its  only  child. 

If  the  AND  root  has  only  one  OR  child,  which  has  r  children,  then  y)  (-l)-M-5')  — 

ScT 


-(1  -  l)r  +  1,  since  all  (non-empty)  combinations  of  the  OR  node’s  subtrees 


can  be  used  to  construct  ScT]  here  structural  induction  is  used  to  attain  the  factor  1  that 
arises  from  each  such  subtree  S. 

Finally,  the  latter  formulation  is  just  a  restatement  of  the  former  when  restricted  to  the 

subtree  of  T  that  is  logically  equivalent  to  \J  S.  | 

ScT 

xs=l 


Theorem  3.  For  fixed  load  a  <  1  and  suitably  large  fixed  c,  d  and  h,  with  ip  >  dlogn,  the 
expected  number  of  probes  to  insert  xan ,  in  UH,  DH ,  and  DH is  bounded  by  j-1—  -f  O(L). 

Proof:  This  bound  is  elementary  and  well  known  for  UH ,  although  (1)  gives,  in  passing, 
an  independent  albeit  not  so  elementary  proof  of  the  fact.  Rather  than  evaluate  an  even  more 
complicated  expression  for  the  expected  number  of  probes  in  DH  and  DH we  shall  formulate 
an  elaborate  expression  that,  in  UH,  must  equal  +  and  observe  that  the  computational 

differences,  for  the  analogous  expression  in  DH  and  DH S  comprise  an  additive  O(^). 


Definition  12. 

•  Let  Trf  be  the  set  of  all  fully  structured  h-superdependency  DAG-trees  (where  all  dummies 
are  unambiguously  identified  as  in  the  AND_OR  representation)  that  have  k  vertices,  and 
have  outdegree  £  at  the  root. 

The  tree  property  just  says  that  the  structure  is  a  tree  if  we  view  the  edges  as  undirected. 

•  Given  T  C  [1,  an  -  1],  let  D^p  denote  DT  concatenated  with  an. 

•  Given  T  e  Trf.,  Let  (H^h\D^p )  — *■  T)  denote  the  event  that  the  set  D^p  is  locally  hashed  into 
the  tree  structure  T,  and  the  actual  nodes  (apart  from  the  root)  are  assigned  locations  that 
satisfy  M W  at  the  times  of  their  insertion  in  D.  By  definition,  we  require  that  \Dj,  \  —  k. 
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•  Given  an  event  E,  let  X(E)  be  the  indicator  function  for  E,  which  is  one  if  E  occurs,  and 
zero  otherwise. 

•  Let  Eiree  denote  the  event:  {  the  (globally  defined)  /i-superdependency  DAG  G^ep{xan,D) 
is  a  tree  ). 

•  Let  Tri  =  Uk<clognTr£k. 

•  Let 

=  E  (-1  )fr^X(H(h\D+)^T)x(Etree). 

Tc(l,2,...,an-1) 

|T|<clogn 

TeTr{ 

Lemma  1  guarantees  that  P^-  achieves  the  correct  zero-one  values  when  restricted  to  the 
portion  of  the  event  space  where  \sdep(h) (xan,  D)\  <  clog??. 

•  Let  be  the  number  of  local  h-superdependency  AND_OR  trees  that  occur  with  clog  ??  or 
fewer  atoms  from  D,  with  root  xan ,  AND  degree  £  at  the  root,  when  sdep(xan,  D)  >  clogn, 
and  G^Jep(xam  D)  is  a  tree.  Let  P^-  be  zero  when  sdep(xan,  D)  <  clog??  or  G^hJep(xan,  D) 
is  not  a  tree. 

By  definition,  +P^  is  never  negative,  and  P^(-  —  P^  is  never  positive  when  sdep(xQn ,  D)  > 
clog  ??. 

Because  of  limited  independence,  and  the  use  of  probe  axioms  that  permit  a  fair  amount  of 
deviant  probe  behavior,  there  are  additional  error  terms.  When  the  h-superdependency  set  gets 
too  large,  our  limited  independence  will  fail  to  quantify  its  behavior  very  well. 

•  Let 

f  3  =  CQnPxoh{\sdep^  {xan-,  D)\  >  clog??}, 
which  charges  a  cost  of  cq??  probes  to  the  event. 

•  Even  longer  probe  excursions  might  occur,  which  are  quantified  by  setting 

P4  =  E  Prob{probean  >  t}- 

t>c0n 
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We  now  analyze  the  expected  number  of  probes  contributed  by  non-tree  DAGs.  The  error 

P4  captures  some  of  these  events.  Consider  the  h  superdepeiidciicy  DAGs  G  that  are  not  trees 

£ 

and  yield  TeTr  as  their  omnidirected  spanning  tree,  where  |T|  =  k. 

P(t) 

•  Let  — |—  be  the  expected  number  of  h -superdepenclency  DAGs  of  clog??  or  fewer  nodes  that 
are  not  trees  and  occur  with  root  xan,  and  where  xan  has  no  non-tree  edges. 

The  following  two  terms  account  for  the  extra  probes  by  non-tree  edges  of  a  DAG  G. 

•  Let,  for  k  <  clog??,  B(k)  be  the  indicator  function  for  the  event 
(\sdep(h)(xan,  D)\  =  and  degr(G)  >  degr(T).  Set 

P6=  E  2*E[B(*)]i 

k < clog  n 

so  that  we  take  a  penalty  of  2\sdep^(xan,  D)\  probes  when  xan  uses  at  least  one  non-tree 
probe. 

•  Let,  for  k  <  clog??,  ER(k )  be  the  indicator  function  for  the  event: 

(| sdep(h)(xan,  D) |  =  k),  and  degr(G)  >  degr(T )  +  1.  Let 

P7  =  c0??  x  £  E [ER(k)\, 

k<c  log  n 

so  that  we  take  a  penalty  of  cq??  probes  when  xan  uses  two  or  more  non-tree  probes. 
Finally,  we  must  account  for  the  errors  that  occur  due  to  the  differences  in  the  models  and 
instances  of  UH ,  DEI,  and  DH Of  the  expressions  listed,  only  Pj  has  a  contribution  that  is 
not  already  asymptotically  counted. 

•  Let 

P8=  y  V  |ProbBfl^(ifW(Z)+)-,T)-Prob£,^(ifW(Z)+)-T)|. 

l<l<clogn  T c(l,2,...,cm-l) 

|T|<clogn 
TeTr % 

We  conclude  that 

E  \probean]  —  y  E[P<f|]  +  0(  y  E  [P2(')]  +  P3  +  P4  +  P5  +  P6  +  P7  +  P8). 

0<£<cl°gre  0<t<clogK 
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In  our  models,  the  probability  that  a  tuple  of  k  items  hashes  into  a  local  tree  can  vary  by  a 
factor  of  only  (1  +  0(^-))  (Lemma  2  of  [16]).  Similarly,  the  probability  that  a  set  of  k  locations 
satisfy  our  multiplicative  vacancy  overestimator,  at  specific  insertion  times,  is,  up  to  a  factor 
of  (1  +  0(^-)),  independent  of  the  specific  family  of  hash  function  in  DH ^  (Lemma  6  of  [16]). 
Consequently, 

1,2  i 

Ps=  E  s(*,<m)0(^-)  =  0(h) 

!<&<clog  n 


by  Theorem  1. 

Furthermore, 

Eo<£<clog«E[P2(/)]  =  0(1)  by  Corollary  4; 

P3  =  0(L)  by  Theorem  1,  Corollary  1,  Theorem  2,  and  Corollary  2; 

Pf  —  0(E)  by  the  clehnitions  of  DH  and  DH 
Ps  +  P6  =  0(  |-)  by  Theorem  2;  and  finally, 

P7  —  0(E)  by  Theorems  1  and  2,  Lemma  A,  the  definitions  of  DH  and  DHif) ,  and  the 
fact  that  the  two  extra  constraints  on  the  root’s  probing  introduce  an  additional  factor 
of  O(p-)  into  the  accounting.  | 

3.  Extensions 

We  now  apply  the  techniques  from  Section  2  to  see  what  else  can  be  deduced  for  variants  of 
double  hashing  with  full  and  limited  randomness,  and  linear  probing  and  uniform  hashing  in 
the  case  of  limited  randomness. 

3.1  Higher  moments 

Our  generic  error  bounds  also  apply  to  such  performance  measures  as  the  expected  r-th  moment 
of  the  probe  count.  The  basic  reason  for  this  fact  is  a  convenient  representation  of  moments  in 
our  dependency  set  algebra.  We  may  write: 


E[pr  <-(*;]=  E  (C-(<-l  nif’  +  oii), 

i<  c  log  n 

The  cutoff  clogn  needs  only  a  nominal  adjustment,  due  to  the  radius  of  convergence  for  the 
generating  function  with  coefficients  s(k,an ),  and  the  requirements  for  DH  and  DH ^  must  be 
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modified  to  guarantee  that  for  some  fixed  cq  that  depends  on  ct, 

tT~1Prob{ |  U*=1  { p(x ,  z ) } |  <  an  +  1}  <  l/n. 

t>c0n 

Corollary  5.  Let  r  be  fixed,  and  load  a  <  1  be  fixed.  Then  for  sufficiently  large  fixed  c, 
depending  on  a  and  r,  and  ip  >  clogn,  where 

EDH4,  [proberan]  =  EUH[proberan\  +  0(±-), 

provided  ip  >  clogn.  1 

3.2  Tree  statistics 

An  immediate  outcome  of  our  limit  studies  is  that  there  is  a  probability  distribution  on  de¬ 
pendency  DAGs,  and  this  distribution  is  asymptotically  the  same  in  UH ,  DH  and  DH for 
sufficiently  large  ip  =  O(logn).  Recall  that  a  hash  tree  structure  is  defined  to  impose,  on  the 
vertices,  a  total  order  that  is  consistent  with  the  partial  order  dictated  by  the  tree  structure 
itself.  A  reasonably  representative  distribution  theorem  is  the  following. 

Theorem  4.  Let  T  be  a  fixed  hash  tree  structure  of  k  vertices.  Then  the  probability,  in  UH , 
DH ,  and  DH for  sufficiently  large  ip  —  O(logn),  that  the  dependency  DAG  G(xan,  D)  is 
isomorphic  to  T  is  ^  11^,  (1  -q)(o-  ^y)*-1(l  +  ^ ) .  |The  error  factor  comes  from  Appendix 

A  or  from  [16].  The  principal  term  can  be  derived  from  the  proof  of  Theorem  1  in  [16],  or  direct 
calculation  in  UH . 

3.3  The  criterion  A'U2) 

As  indicated  earlier,  the  probability  distribution  for  our  vacancy  estimator  ±\Ull>  converges  much 
faster  than  the  weak  calculations  we  provide,  ft  is  even  interesting  to  observe  just  how  far  the 
vacancy  criterion  M  G)  can  be  used  to  attain  optimal  probe  performance  for  l) //,,,, .  Surprisingly, 
perhaps,  it  turns  out  that  the  full  use  of  the  A'U2)  criterion  gives  a  constant  for  the  expected 
number  of  2-superdependency  trees  for  loads  a  <  .669.  The  details  are  omitted. 

The  basic  facts  are  that  we  may  compute  qU)  explicitly  to  get  qU)[an)  =  D~2a~e  "(l  +  l/n), 
and  QU)(an)  —  (l  +  e-")e1-e  ° -2.  The  value  .669  is  obtained  from  a  numerical  overestimate  to 
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the  rescaled  differential  equation  associated  with  equation  (2)  and  as  stated  with  jJ  =  1 .  A 
change  to  //  =  3,  for  example,  would  yield  a  conservative  cutoff  for  optimal  performance  bounds 
under  AU2). 

3.4  Tertiary  clustering 

In  tertiary  clustering  (which  is  also  called  2-ary  clustering),  the  probes  sequence  is  defined, 
for  j  >  2,  by  p(x,  j )  =  f(p(x,  1  ),p(x,  2),  j)  where  the  probes  p(x,  1)  and  p{x,  1)  are  assumed 
to  be  fully  random,  and  /  is  a  fully  random  uniformly  distributed  hashing  function  mapping 
[0,  n2  -  1]  x  probe  index  [0,  n  -  1],  See,  for  example,  [3]. 

Our  performance  bounds  show  that  any  (reasonable)  deterministic  /,  will  achieve  the  same 
performance  as  tertiary  clustering;  the  randomness  provided  by  the  first  two  probes  is  sufficient, 
as  long  as  the  requirements  for  DH ^  are  satisfied. 

3.5  Uniform  hashing 

Uniform  hashing  is  generally  viewed  as  an  idealized  model  because  of  its  computational  require¬ 
ments.  Each  probe  requires  the  evaluation  of  a  random  probe  (or  worst  yet,  each  element  must 
be  mapped  into  a  permutation).  We  can  use  ij)- wise  independence  to  define  a  reasonably  uniform 
scheme  that  is  robust  and  has  optimal  performance,  up  to  negligible  errors,  for  any  load  factor 
bounded  by  1. 

For  size  \D\  —  an  data  sets,  the  construction  in  Section  1.1  of  [16]  gives  a  universal  set 
of  linear  hash  functions  that,  with  high  probability,  maps  D  without  any  collisions  into,  say, 
[0,p— 1],  where  p  ss  n4.  We  can  then  hash  [0,p- 1]  into  [0,  n-  1]  by  defining  a  ^>-wise  independent 
F^p  with  domain  [0,p-  1]  x  [1,  V’],  so  that  p(x,j )  =  f(ip  x  x  +  j)  mod  n,  where  /  e  iU. 

As  stated,  there  is  a  slight  technical  flaw.  These  hash  functions  will  include  a  small  number 
of  defective  functions  that  fail  to  meet  the  robust  coverage  requirements.  Even  the  presence  of 
one  defective  hash  function  will  spoil  the  expected  performance,  if  it  fails  to  hash  D,  no  matter 
how  many  probes  are  used.  Accordingly,  a  scheme  that  uses  UH a  should  either  switch  to  linear 
probing,  or  rehash  the  entire  data  set,  once  some  large  number  of  probes  (such  as  n,  or  ne )  have 
been  expended  in  an  unsuccessful  attempt  to  insert  a  given  datum.  Alternatively,  a  more  robust 
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formulation  of  p(x,j ),  for  our  example,  might  be  p(x,j )  =  f(ip  x  a’  +  (j  mod  ip))  +j  mod  ra,  where 
/  e  and  ip  is  relatively  prime  to  n. 

In  any  case,  we  have  the  following  Corollary,  that  is  subject  to  a  robustness  requirement 
such  as  that  stated  in  the  definition  of  DH . 

Corollary  6.  Let  r  be  fixed,  and  load  a  <  1  be  fixed.  Then  for  sufficiently  large  fixed  c, 
depending  on  a  and  r,  and  ip  >  clogn,  where 

Et  JH^\Probeln]  =  E UH\ProbeTan]  +  0(E), 

provided  ip  >  clogn,  and  the  hashing  includes  some  form  of  robustness  guarantee. 

4.  Linear  Probing 

Linear  probing  is  a  hashing  scheme  that  trades  some  of  the  efficiency  of  double  hashing  for 
the  computational  efficiency  of  having  only  one  non-trivial  evaluation  per  key  reference.  It 
originated  at  a  time  when  computation  was  more  expensive,  and  search  was  somewhat  local  and 
sequential,  which  may  be  still  be  the  case,  for  some  storage  devices.  In  this  scheme,  p(x,k )  = 
f(x)-k+ 1  mod  n.  Knuth  [11]  showed  that  the  expected  insertion  cost  for  xan_^_i  is  ^  XZ*>o (^+ 
l)!(°”fc_1)?7_fc,  which  is  less  than  (1  +  1/(1  -a)2)/2. 

Let  LP  denote  the  model  of  linear  probing  with  fully  independent  uniformly  distributed  hash 
functions.  Let  LP denote  a  model  of  linear  probing  with,  for  simplicity,  uniformly  distributed 
and  fully  ^-wise  independent  hash  functions.  As  before,  we  construct  a  formulation  for  the 
expected  number  of  probes  that  will  expose  an  approximate  isometry  between  LP  and  LP 

Let  X(I)  bound  the  probability  that  a  table  of  n  locations  will  have  more  than  /  consecutive 
table  locations  occupied  after  an  -  1  items  are  inserted  in  LP  (or  LP ^).  It  is  convenient  to  use 
X(l)  to  analyze  a  slightly  different  problem,  which  involves  hashing  k  items  into  [1,2/]  without 
wraparound:  if  a  bottom  segment  [l,/0]  becomes  full,  then  items  subsequently  hashing  into  the 
segment  are  discarded. 

Let  W (k)  be  the  expected  number  of  probes  to  insert  a  key  x  item  onto  [1,2/],  when  x 
hashes  with  its  first  probe  is  to  location  /  +  1,  and  k  keys  have  already  been  inserted  among 
the  locations  [1,2/]  according  to  linear  probing.  With  probability  1  -X(I),  the  keys  inserted 
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outside  of  this  interval  cannot  affect  the  hashing  of  x.  We  will  require  ip  to  be  so  large  that 
W  (k)  is  the  same  in  LP  and  LP^.  for  relevant  k.  Then  by  inclusion-exclusion, 


E\probean]  =  £  W (i)  (“+"/)  (|)‘+’  f  +' ’)  H) 


k<21 
0<i<4|"e  J]  -  k 


+  0  (  nX(I)  +  £  W(k) 

k<21 


an-l\  /2A4re/1  (4\el\ 


(23) 


4  \el]  J  \  n  J  \  k 

where  both  E \probean]  and  X(I)  are  computed  in  the  same  LP  or  LP^,  We  charge  a  penalty  of 
n  probes  if  the  table  has  more  than  /  consecutive  locations  filled,  for  some  interval,  at  insertion 
time  xan ■  The  sum 

e  (7 +/)(vf er)1-1)' 

0<j<4:[el~\-k  V  J  '  V  7  V  7 

uses  inclusion-exclusion  to  overestimate  the  probability  that  exactly  k  items  will  have  a  first 
probe  hashing  into  the  interval  [1,2/],  among  locations  [0,n  -  1].  The  inclusion-exclusion  factor 
is  due  to  the  fact  that  the  summation  begins  over  /-tuples.  The  bound  on  j  is  selected  to 
be  large  enough  to  give  a  satisfactory  error,  which  is  just  the  last  term,  and  to  stay  within  the 
freedom  =  4eJ. 

The  value  for  W  is  the  same  in  LP  and  LP since  k  is  suitably  bounded,  as  is  the  expected 
number  of  (k  +  j)- tuples  that  initially  probe  [1,2/].  The  only  model  dependent  issue  is  how 
large  /  has  to  be  in  LP so  that  X(I)  and  the  inclusion-exclusion  errors  will  be  small. 

The  simplest  way  guarantee  that  X(I)  is  small  is  to  use  Chernoff-Hoeffding  bounds  for 
limited  independence. 

Evidently,  X(I )  <  ??,Prob{at  least  /  items  in  D  have  their  first  probe  in  [1,/]},  since  the 
factor  of  n  accounts  for  all  shifts  of  [1 ,/].  But  this  probability  describes  a  large  deviation 
for  a  sum  of  an  ip- wise  independent  Bernoulli  trials,  each  having  probability  7  of  success. 

From  a  weak  application  Theorem  5  of  [17],  we  attain  that  if  ip  >  1(1  -  a),  then 


Probfat  least  /  items  in  D  have  their  first  probe  in  [1,  /]}  <  e 


-jl -cc)2I 


It  follows  that  X(1 )  is  nominal  in  LP  and  LPj,  when  ip  —  c(logn),  for  suitably  large  fixed  c. 
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The  remaining  errors  in  the  summation  (23)  are  bounded  by  the  inclusion-exclusion  terms 
where  j  —  4f e/~|  -  k.  which  are  bounded  by 
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Since  all  errors  are  exponentially  small  in  /,  we  have  the  following. 


Theorem  5. 

For  any  constant  it>  and  any  fixed  load  a  <  1,  there  is  a  constant  c  such  that  linear  probing 
with  a  hash  function  chosen  from  a  set  of  clog?t-wise  independent  hash  functions  results  in  an 
expected  insertion  cost  that  exceeds  that  of  LP  by  at  most  n~w.  1 

We  deduce  that  LP  and  LP ^  have,  apart  from  a  polynomially  small  error,  the  same  expected 
r-th  moment  of  the  probe  count,  for  any  fixed  r. 

5.  Conclusions 

We  have  shown  that  in  double  hashing,  a  universal  family  of  hash  functions,  that  with  high 
probability  provides  c  log  n- wise  independence,  will  give  optimal  performance  for  any  fixed  load 
bounded  below  1.  The  notion  of  double  hashing  is  generalized  to  include  almost  any  reasonable 
probe  scheme.  Similarly,  linear  probing  incurs  no  loss  of  performance  when  such  hash  functions 
are  used.  These  optimality  results  apply  to  the  expected  r-th  moment  of  the  probe  count,  for 
any  fixed  r. 

As  noted  in  [16].  a  pool  of  O(log  log  m  +  log  2n )  random  bits  are  sufficient  to  achieve  suitably 
random  universal  families.  The  hash  functions  of  [18]  show  that  it  is  indeed  possible  to  program 
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hash  functions  that  exhibit,  with  high  probability,  clog  n- wise  independence  on  the  subset  of  keys 
D ,  execute  only  a  constant  number  of  arithmetic  operations  to  hash  a  key,  and  use  O(log  log  m  + 
log  2n)  random  bits. 

These  results  comprise  a  significant  step  toward  understanding  why  extremely  simple  func¬ 
tions  seem  to  perform  so  well  when  used  to  double  hash  arbitrary  values  into  a  partially  filled 
table.  Nevertheless,  there  is  still  a  gap  between  the  use  of  log  n- wise  independent  hash  func¬ 
tions,  and  those  that  are  typically  used;  yet  one  may  well  wonder  if  real  data,  when  hashed  by 
standard  hash  functions,  might,  in  fact,  exhibit  the  statistics  of  log n- wise  independence. 

Our  proof  technique  analyzed  local  and  global  hashing  interactions  separately,  and  used  ana¬ 
lytic  tools  to  measure  complicated  but  weakly  correlated  events  in  terms  of  simpler  independent 
processes.  Bundling  and  thinning  techniques  were  used  to  eliminate  (spurious)  combinatorial 
explosions  from  more  naive  counting  formulations.  Surely  these  methods  can  be  applied  to  oth¬ 
er  probabilistic  processes  that  exhibit  weak  correlations  or  that  might  only  be  supported  by  a 
source  of  limited  randomness. 

Appendix  A 

Lemma  A.  Adapted  from  [16].  Let  G  be  a  superdependency  tree  of  k  vertices,  and  let  S  — 
(5*i,  S2  ■  ■  ■  Sk)  be  a  sequence  of  k  distinct  elements  in  U.  Then  for  k  =  0(??1/3), 

1)  The  probability  that  the  superclependency  DAG  Gsdep(S )  =  G  under  for  ip  >  3 &,  is 

1 +0{k2/n) 
nk- 1 

2)  The  probability  that  S  hashed  into  a  superdependency  DAG,  that  properly  contains  the 
structure  G  as  its  omnidirected  spanning  tree,  under  DH for  ip  >  3 &,  is  bounded  by 
0{k3/nk). 

Proof:  Let  G  =  (V,E),  where  |V|  =  k.  Let  Tr  —  (V,Ej^r)  be  the  omnidirected  spanning 
tree  discovered  by  the  search.  Let  the  sequence  S  be  (A] .  S-2-  ■  ■  ■  ■  Sk).  list  the  vertices  in  the 
order  of  discovery  by  the  DFS,  with  the  root  first.  We  embed  the  vertices  in  this  processing 
order. 
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1 )  Suppose  that  G  is  a  tree,  whence  ETr  —  E.  The  root  pj  can  be  embed  anywhere;  there 

are  no  constraints.  Consider,  now,  the  embedding  of  Sj,  for  some  j  >  1.  Let  Sj  have  the 
parent  Sj  in  G,  and  suppose  that  vertex  Sj  has  r  children  in  G,  has  indegree  1  and  is  the 
h- th  probe  of  S{.  Then  the  probability  that  the  h-th  probe  of  Si  is  to  a  vacant  location 

(with  respect  to  Si,  <§2, . . . ,  Sj-i)  is  between  1  and  1  -  0(^-),  and  the  probability  that 

p(Sj,r  +  1  )  =  z  given  that  p(Su  h)  =  z  is  n_Q{1y 

If  Sj  is  a  dummy  root  with  r  children,  then  the  probability  that  its  (r  +  l)-st  probe  does  not 
collide  is  between  1  and  1  -  0(^-).  However,  each  dummy  root  will  be  accompanied  by  a  cross 
edge  to  a  previously  embedded  vertex.  This  cross  edge  will  comprise  a  probe  to  a  previously 
embedded  vertex,  and  will  have  a  probability  of  n  of  occuring  as  specified.  The  vertex 

issuing  this  probe  will  thus  have  two  constraints  (or  one  if  it  is  a  dummy  root):  one  for  the 

unsuccessful  cross  edge  probe,  and  one  for  a  successful  insertion  into  an  unsuccessful  probe 
location  of  its  parent  (if  present). 

We  appeal  to  the  independence  of  individual  probe  sequences  to  multiply  all  k  factors  to 
get  a  value  between  (|-)A_1(1  +  0(k/n ))  and  (1  -  ^^)(^)A~1,  which  proves  1). 

2)  If  G(S)  is  not  a  tree  then  E  A  ETr  and  the  nodes  of  Tr  have  different  embeddings  since 
collisions  occurred.  The  tree  construction  is  similar,  but  some  nodes  x  e  Tr,  will  have  gaps 
in  their  probe  sequences  p(x,  l),p(x,2), ...  to  their  tree  children,  since  edges  to  nodes  that 
are  already  embedded  or  that  have  embedding  locations  already  specified  will  be  omitted. 
Now,  the  initial  probe  sequences  for  any  k  items  are  mutually  independent,  as  long  as  the 
total  number  of  probes  is  bounded  by  if}.  Consequently,  the  probability  that  V  hashes  into 
a  DAG  that  yields  ETr  as  its  search  tree,  under  traversal  by  the  DFS,  is  at  most  El 1  Prj ■< 
where  pr?  overestimates  the  probability  that  the  j-th  vertex  is  hashed  to  have  the  correct 
non-tree  probes  to  previously  determined  locations. 

Let  Sj  have  hj  tree  edges.  To  upper  bound  prj ,  we  distinguish  among  three  cases:  Sj 
has  no  non-tree  edges,  Sj  has  fewer  than  hj  +  2  non-tree  edges  and  at  least  one,  and  Sj 
has  at  least  hj  +  2  non-tree  edges.  Note  that  if  no  two  locations  can  be  probed  twice  in  a 
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probe  sequence  -  as  is  the  case  in  double  hashing  -  then  cases  two  and  three  combine  into 
the  case  Sj  has  at  least  one  and  at  most  k  -  1  non-tree  edges. 

The  first  case  is  as  the  overestimate  in  1),  and  contributes  a  probability  of  at  most  1  to 
pri,  and  at  most  b(f  _|_  0(l/n))  to  pr.j,  for  j  >  1. 


fn  the  second  case,  there  are  different  DAG  structures,  depending  on  which  probe  count 
within  (hj  +  2, . . . ,  2 hj  +  2)  is  the  last  and  actually  embeds  Sj-  Summing  over  all  possible 
last  probe  counts,  over  the  possible  probe  counts  that  correspond  to  the  first  non-tree  edge, 
which  is  among  the  first  hj  +  1  probes,  and  the  set  of  possible  destinations  for  this  first 
non-tree  edge,  (which  must  be  to  a  location  already  probed  by  Sj  or  some  other  item  in  5), 
we  get  ag  an  overestimate  for  the  probability  contributed  to  pr j  by  case  2. 


In  the  third  case,  there  must  be  two  consecutive  non-tree  edges  among  the  first  2 hj  +  2 
probes  of  Sj.  These  edges  may  go  to  previously  embedded  items  or  collide  with  earlier 
probes  of  Sj.  To  estimate  this  contribution  to  pi  j .  we  ignore  the  requirement  that  Sj  must 
be  placed  successfully  and  focus  on  the  expected  number  of  ways  a  first  pair  of  such  probes 
could  occur,  which  is  bounded  by  (2 hj  +  1)°^  '  ■ 


Combining  like  terms  from  the  three  cases  into  factors  and  multiplying  gives 

n  (^+o(A±ih-))  =  (I)‘->(i  +  o(tyn)). 

1  <j<k 

and  hence  the  probability  that  G  results  from  the  traversal  of  a  non-tree  DAG  is  at  most 

iw)k  '('  +  0(k3/n ))  -  (1  -  =  0(P/nk )  | 
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